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Project  Objective 

The  threat  of  atmospheric  contamination  by  hazardous  materials  remains  a  high  national 
security  concern.  There  is  a  strong  need  for  the  development  of  emerging  technologies 
which  can  significantly  advance  risk  assessment  and  response  capabilities.  Thus,  the 
objective  of  this  project  was  the  development  and  validation  of  SCIPUFF  tangent-linear 
or  adjoint  model  as  a  counterproliferation  and  counterforce  tool  for  characterizing  the 
source  of  a  nuclear,  biological,  chemical  (NBC)  release  using  observational  data.  A 
fundamental  aspect  of  gradient-based  inverse  modeling  methods  are  the  sensitivities  of 
the  ‘mismatch’  between  numerical  simulation  and  measurement  for  observables  of  a 
hazardous  release  to  model  parameters  and  inputs.  Thus,  another  objective  of  this 
research  project  was  to  use  the  SCIPUFF  tangent-linear  or  adjoint  model  to  further  the 
current  understanding  of  the  relationships  between  field  observations,  the  characteristics 
of  the  release  source,  and  the  transport  and  dispersion  of  the  hazardous  cloud. 

In  this  project,  Aerodyne  Research  Inc.  (ARI)  has  developed  and  demonstrated  an 
algorithm  for  source  estimation,  called  AIMS  (“Aerodyne  Inverse  Modeling  System”). 
AIMS  takes  as  input  all  available  observational  data  and  optionally  any  prior  knowledge 
of  the  source  parameters.  The  output  is  the  set  of  source  parameters  that  best  describes 
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the  observations,  including  number  of  sources,  emission  rates,  locations,  start  and  end 
times.  AIMS  is  also  designed  to  include  an  a  posteriori  assessment  of  its  solution  quality, 
providing  useful  feedback  on  how  much  confidence  to  put  in  a  particular  solution  and  in 
what  ways  the  solution  quality  might  be  improved.  A  novel  feature  in  AIMS  is  the 
ability  to  integrate  multiple  observation  types  in  order  to  maximize  information  content 
for  source  estimation.  This  capability  has  been  demonstrated  for  datasets  from  stationary 
and  mobile  sensors. 

AIMS  provides  a  significantly  improved  capability  to  protect  the  warfighter  and  civilians 
through  successful  identification  and  quantification  of  unknown  hazardous  atmospheric 
releases. 


Background 

The  standard  modeling  approach  for  the  tracking  of  atmospheric  plumes  falls  in  the 
category  of  forward  modeling:  given  an  initial  state  (such  as  the  atmospheric  state,  an 
initial  tracer  concentration),  and,  possibly  time-dependent  boundary  values  (such  as 
space-/time-dependent  tracer  emissions  and,  in  case  of  an  offline  meteorological 
background  the  time-evolving  meteorological  state),  a  transport  model  is  stepped  forward 
in  time  to  produce  a  field  of  tracer  concentrations  at  subsequent  times. 

Forward  atmospheric 


transport  &  dispersion  model 

Predicted 

concentration 

Known  gas 

(SCIPUFF) 

sources 

field 

Inverse  model 
based  on  SCIPUFF 
(AIMS) 

Figure  1.  An  illustration  of  atmospheric  dispersion  modeling  in  forward  and  inverse  modes.  In  the 
forward  mode,  the  gas  source  characteristics  are  well-known  and  the  goal  is  to  predict  concentrations 
at  a  later  time  and  downwind  of  the  emission  sources.  In  the  inverse  mode,  downwind  concentrations 
are  known  from  measurements  and  the  goal  is  to  determine  the  source  characteristics. 


Measured 

concentration 

field 


Source  locations, 
magnitudes, 
times? 


In  source  estimation,  one  wishes  to  compare  the  time-evolved  tracer  concentrations  with 
a  set  of  measurements  of  these  quantities.  This  sort  of  problem  falls  in  the  category  of 
inverse  modeling  -  the  goal  is  to  determine  the  initial  and/or  boundary  values  that  lead  to 
a  model  trajectory  most  consistent  with  the  observations.  So,  given  the  measurement 
data,  one  wishes  to  identify  and  quantify  the  emission  sources.  Among  other  benefits, 
inverse  modeling  enables  more  accurate  assessment  of  the  impacts  of  emissions  events: 
unknown  inputs  (e.g.  emission  rates/quantities)  required  for  the  forward  modeling  can  be 
automatically  determined  from  measurement  data  using  inverse  modeling.  Inverse 
atmospheric  dispersion  modeling  is  an  area  of  active  research  [Bowen  and  Stratton 
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(1981),  Daescu  and  Carmichael  (2003),  Errico  (1997),  Long  et  al.  (2010),  Lushi  and 
Stockie  (2010),  Mavriplis  (2007)]. 


The  forward  model  of  interest  in  this  work  is  SCIPUFF  (Second-Order  Closure  Integrated 
PUFF,  [Sykes  et  al.  (1985)]),  which  is  a  state-of-the-art  atmospheric  transport  and 
dispersion  model  adopted  by  the  Defense  Threat  Reduction  Agency  (DTRA).  As 
described  in  the  next  section,  SCIPUFF  is  equipped  to  model  a  variety  of  emission 
scenarios  involving  varying  characteristics  (locations,  rates  and  times)  and  under  wind 
conditions  that  vary  spatially  and  temporally. 


SCIPUFF  Atmospheric  Transport  and  Dispersion  Model 


The  atmospheric  dispersion  model  is  the  Second-order  Closure  Integrated  Puff 
(SCIPUFF)  model  [Sykes  et  al.,  (1986,  1995,  1999)]  which  is  the  dispersion  model 
employed  in  the  Hazard  Prediction  and  Assessment  Code  Capability  (HP AC)  [SAIC, 
2001].  While  SCIPUFF  initially  treated  gaseous  puffs  evolving  in  the  atmosphere, 
interactions  with  other  materials  are  included  in  current  versions.  This  includes  solid 
particles,  liquid  droplets,  and  nuclear  material  and  the  ability  to  treat  evaporation, 
deposition,  and  precipitation.  Release  sources  include  both  continuous  plumes  and 
instantaneous  puffs.  A  single  calculation  allows  for  any  combination  of  continuous  and 
instantaneous  releases  at  any  time  throughout  the  temporal  domain.  Continuous  releases 
are  modeled  as  a  series  of  single  puffs  releases  one  after  another 

The  concentration  field  is  modeled  using  the  Gaussian  puff  method  [Bass  (1980)].  The 
concentration  field  is  given  by  the  summation  of  individual  puffs,  and  the  concentration 
for  each  puff,  at  a  given  time,  is 


c(.v)  = 


(1) 


In  Equation  1,  Q  is  the  puff  mass,  x  is  the  puff  centroid,  and  a  is  the  puff  spread.  The 
dynamic  relations  are  derived  by  moment  methods,  where  the  conservation  equations  are 
integration  over  all  space.  Turbulent  diffusion  is  modeled  by  the  second-order  schemes  of 
Donaldson  [1973]  and  Lewellen  [1977]  where  the  second  order  fluctuation  terms  are 
solved  for  by  a  corresponding  transport  equation.  The  closure  scheme  provides  a 
statistical  variance  of  the  concentration  field,  and  the  variance  provides  probability 
information.  Thus,  the  SCIPUFF  calculations  provide  mean  concentration  values  together 
with  a  corresponding  uncertainty  estimate.  Practical  applications  of  the  model  on  50  km 
and  3000  km  length  scales  have  been  performed  [Sykes  et  al.,  (1986,  1995,  1999)]. 

Numerical  techniques  used  include  puff  splitting,  merging,  and  adaptive  time  steps. 
Because  the  puff  centroid  evolves  based  on  the  mean  flow  velocity  at  the  centroid 
location,  the  Gaussian  representation  becomes  less  accurate  as  the  puff  grows.  In  order  to 
overcome  this  problem,  if  a  puff  reaches  a  critical  width,  the  puff  is  split  into  smaller 
puffs.  The  summation  of  small  puffs  can  be  used  to  accurately  represent  a  generalized 
concentration  field.  As  the  smaller  puffs  evolve,  if  the  puffs  have  a  region  of  considerable 
overlap,  then  they  are  recombined  or  merged  to  a  single  Gaussian  distribution.  As 
different  puffs  evolve,  each  is  exposed  to  a  different  dynamic  environment,  and  as  such, 
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different  puffs  may  have  different  timescales  for  numerical  integration.  The  adaptive  time 
stepping  scheme  allows  for  individual  puffs  to  be  stepped  at  individual  time  step  levels. 

Gaseous  materials  released  into  the  atmosphere  in  many  cases  have  an  initial  vertical 
momentum  and  an  initial  temperature  different  than  those  of  the  surroundings.  SCIPUFF 
has  an  option  to  include  buoyancy  effects  in  the  calculations.  The  buoyancy  effects  are 
calculated  with  evolution  equations  instead  of  assigning  the  buoyant  puff  characteristics 
to  the  source.  For  instance,  an  effective  release  height  is  not  used  in  the  model.  Instead 
the  actual  release  height  is  used  and  subsequent  change  in  puff  centroid  with  altitude  is 
calculated  by  solving  the  Boussinesq  momentum  equation  and  the  conservation  of 
potential  temperature. 

Meteorological  input  comes  in  a  variety  of  forms:  fixed  winds  (or  uniform  velocity 
vector),  surface  observational  data,  upper  air  observational  data,  and  three-dimensional 
gridded  data.  Observational  data  includes  mean  wind,  temperature,  and  possibly 
boundary  layer  parameters.  Data  is  provided  for  a  discrete  number  of  locations  and  times. 
The  information  is  interpolated  over  space  and  time  to  give  the  relevant  quantity  at  the 
centroid  of  the  puff  of  interest.  An  option  for  mass-consistency  exists  to  enforce  mass 
conservation  with  respect  to  the  interpolated  wind  field. 

Additional  meteorological  options  include  boundary  layer  and  terrain.  Boundary  layer 
input  parameters  include  surface  roughness  length,  boundary  layer  depth,  Monin- 
Obukhov  length,  and  surface  friction  velocity.  Given  these  parameters,  the  flow  field  in 
the  planetary  boundary  layer  is  calculated.  The  relations  are  based  on  the  modeling  work 
of  Wyngaard  [1985],  Venkatram  and  Wyngaard  [1988],  and  Lewellen  [1981].  The  terrain 
option  allows  for  a  surface  with  varying  height  and  properties.  Surface  elevation  is  read 
from  a  data  file,  and  surface  properties  are  set  either  based  on  the  type  of  land  cover 
(water,  forest,  cultivated,  grassland,  urban,  desert)  or  are  read  from  a  data  file. 


Prior  Development 

Under  a  DTRA  sponsored  Phase  II  SBIR  research  project,  automatic  differentiation  tools 
were  used  to  develop  the  adjoint  and  tangent  linear  models  for  the  SCIPUFF  atmospheric 
transport  and  dispersion  modeling  component  of  Defense  Threat  Reduction  Agency’s 
Hazard  Prediction  and  Assessment  Capability  (HPAC).  Currently,  the  SCIPUFF  adjoint 
model  includes  as  controls  the  release  source  latitude,  longitude,  and  height,  as  well  as 
the  release  mass  and  release  time.  The  model  was  tested  using  Dipole  Pride  26  field  data. 
For  that  data  set,  the  observables  were  the  measured  tracer  concentration  downwind  of 
the  release  from  a  set  of  whole  air  sampling  bags  located  up  to  20  km  from  the  source. 

Under  the  Phase  II  study,  both  (a)  parameter  optimization  and  (b)  source  location 
applications  were  explored.  The  distinction  between  the  two  applications  is  based  on  how 
much  information  is  known  about  the  specific  control  variables.  For  parameter 
optimization,  the  focus  is  on  iterative  adjustments  in  model  input  parameters  which  are 
known  to  within  some  degree  of  uncertainty  in  order  to  minimize  the  difference  between 
model  results  and  measured  observables.  For  example,  measured  wind  speed  and 
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direction  at  selected  locations  would  be  examples  of  model  input  parameters  which 
known  to  some  extent,  but  which  are  subject  to  sufficient  uncertainty  to  significantly 
impact  model  results.  Parameter  optimization  could  adjust  these  parameters  within  the 
uncertainty  bounds  to  improve  model  simulations. 

Figure  2  illustrates  source  location  optimization  using  the  SCIPUFF  adjoint  model  for 
DP26  field  tests  as  a  scatter  plot.  The  x-axis  is  the  ratio  of  the  predicted  (optimized) 
source  longitude  to  the  report  (actual)  longitude.  The  y-axis  is  the  ratio  of  the  predicted 
(optimized)  source  latitude  to  the  report  (actual)  latitude.  Contours  are  also  shown  which 
delineate  regions  for  which  the  distance  between  the  predicted  and  reported  source 
locations  are  less  than  some  value  in  units  of  km.  Thus,  for  most  of  the  DP26  field  trials, 
the  predicted  source  location  was  within  one  kilometer  of  the  actual  release.  Since  the 
actual  release  locations  are  known,  the  data  in  Figure  2  means  that  differences  between 
forward  model  calculations  and  measured  release  tracer  concentrations  result  in  predicted 
source  locations  which  are  within  1-2  km  of  the  actual  release. 


Scatter  in  Predicted  Source  Location 
All  4  coutour  sets  shown 


Figure  2:  Scatter  in  SCIPUFF  adjoint  optimized  source  latitude  and  longitude  for  DP26  field  data. 

For  the  results  in  Figure  2,  initial  guesses  for  the  source  location  were  selected  near  the 
known  release  site.  The  SCIPUFF  adjoint  model  was  then  used  to  iteratively  adjust  the 
source  location  to  minimize  the  cost  function.  Locating  the  source  of  a  hazardous  release 
using  field  observations  when  there  is  no  information  about  parameter  values  is  more 
difficult.  There  are  several  potential  approaches  to  this  problem.  One  approach  is  a 
simple  grid  search  for  local  minima  for  the  cost  function  to  map  potential  source 
locations.  A  second  approach  is  to  combine  source  optimizations  as  described  above  with 
results  from  other  methods  such  as  forward  trajectory  calculations  which  are  being 
pursued  by  other  groups.  A  third  approach  is  an  iterative  optimization  in  which  a  set  of 
initial  parameter  values  are  first  selected  and  then  optimized.  The  set  of  optimized 
parameters  are  then  perturbed  to  define  initial  parameters  for  a  second  optimization.  The 
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process  is  repeated  leading  to  a  map  of  local  minima  for  the  cost  function  through 
parameter  space.  This  approach  is  illustrated  schematically  in  Figure  3  in  which  the 
starting  location  is  at  a  selected  observational  site  or  sensor  location.  This  method  was 
examined  as  part  of  the  Phase  II  SBIR  work  and  illustrative  results  for  two  of  DP26  trials 
with  results  summarized  in  Table  1 


Figure  3.  Schematic  of  iterative  backtracking  approach  to  source  location. 


Table  1.  Summary  of  Predicted  Release  Source  Parameters  for  DP26  Trials  11B. 


Control  Variable 

Reported  Value 

Predicted  Value 

Absolute  Difference 

D26  Trial  11B 

Release  latitude  (El 

-116.0970 

-116.0964 

0.0006 

Releave  longitude  (N1 

37.159S6 

37.1574 

0.0025 

Releave  tune  (UTC) 

23.45 

23.46 

0.01 

Benefit  to  Warfighter 

The  SCIPUFF  inverse  model  would  benefit  the  warfighter  in  two  ways.  The  first  is 
characterization  of  an  unknown  hazardous  release  source.  The  second  is  improved 
atmospheric  transport  and  dispersion  modeling  capabilities  through  model  parameter 
optimization. 

Maturity  of  the  Technology  Prior  to  this  Project 

Prior  to  this  project,  a  tangent-linear/adjoint  for  a  simplified  version  of  SCIPUFF 
atmospheric  transport  and  dispersion  model  had  been  developed  and  tested  using  field 
data.  However,  for  source  location  applications,  the  model  was  in  a  very  early  stage  of 
development  and  a  number  of  scientific  and  technical  challenges  needed  to  be  addressed 
to  enable  further  development  for  the  desired  (source  location/quantification) 
applications. 
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Project  Accomplishments:  The  Aerodyne  Inverse 
Modeling  System  (AIMS) 

Overview 

The  source  of  an  atmospheric  release  can  be  described  by  using  the  following  parameters: 
source  location,  source  height,  mass  released,  time  of  release  and  duration  of  release. 
Estimating  a  source  refers  to  estimating  all  or  some  of  these  parameters.  We  have 
developed  an  advanced  source  estimation  algorithm  named  AIMS  (Aerodyne  Inverse 
Modeling  System).  As  illustrated  in  Figure  4,  AIMS  is  based  on  SCIPUFF  and  takes  as 
input  all  available  measurement  data  and  optionally  any  prior  estimate  of  the  source 
parameters.  The  output  is  the  best  set  of  source  parameters  for  reproducing  the 
measurement  data  using  the  forward  model.  Model  outputs  include  number  of  sources, 
emission  rates,  locations,  start  and  end  times.  AIMS  is  also  designed  to  output  an 
assessment  of  its  solution  quality,  providing  useful  feedback  on  how  much  confidence  to 
put  in  a  particular  solution  and  in  what  ways  the  solution  quality  might  be  improved  (e.g. 
acquiring  more  measurement  data). 


•Obs.  Data 
(MET,  Sensors) 


Figure  4:  Schematic  of  the  Aerodyne  Inverse  Modeling  System  (AIMS) 

A  novel  feature  in  AIMS  is  its  ability  to  integrate  multiple  observation  types  in  order  to 
maximize  information  content  for  source  estimation. 


The  Cost  Function 

The  quality  of  a  source  estimate  can  be  evaluated  by  using  a  cost  function  to  compare 
observed  concentration  data  and  model  predictions  when  the  source  estimate  is  used  in  a 
forward  calculation.  Typically,  the  cost  function  is  defined  to  be  zero  in  the  limit  that 
model  predictions  match  the  data  perfectly  (i.e.  when  the  model  perfectly  predicts  the 
measurements  at  all  sensors  at  all  times),  and  increase  as  the  quality  of  the  source 
estimation  worsens.  The  cost  function  developed  and  used  in  AIMS  is  as  follows: 
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Nmeasurements 


Cost  =  ^ 


m= 1 


I' 

/=1 


/ sensors  2 

I  [r,?-ff(0)]~ 

j= 1 


Nsensors  0 

I  (fft 

0.05*  max  V;. 

^  Nsensors  ~  ^ 

I  (y,7) 

1 

l  7=1  7 

J 

(2) 


where  Y°bs  refers  to  the  observed  concentrations  and  Y"'od  refers  to  the  model  predicted 
concentrations  at  time  i ,  at  sensor  j  ,  obtained  by  running  SCIPUFF  (forward  model)  at  a 
set  of  source  parameters  j3\  and  am  is  a  weighting  constant  for  each  type  of  measurement. 
Note  that  the  total  cost  in  equation  (2)  is  really  a  weighted  sum  of  component  cost 
functions  m,  for  the  different  measurement  types  applied  in  the  source  estimation 
problem.  This  is  to  allow  case-dependent  specification  of  the  desired  relative  impacts  of 
the  different  observation  types  in  a  dataset.  For  instance,  one  may  wish  to  include  in  a 
source  estimation  analysis  stationary  sensor  measurements  known  to  be  highly  uncertain, 
because  the  stationary  sensors  were  at  locations  not  covered  by  the  more  accurate,  less 
noisy  mobile  concentration  measurements.  However,  one  wishes  to  do  so  without  “over¬ 
contaminating”  the  source  estimation  with  the  large  errors  in  the  stationary  measurement. 
This  can  be  accomplished  in  AIMS  by  specifying  appropriate  weighting  factors  (e.g.  am  = 
0.9  and  0.1  for  the  mobile  and  stationary  concentration  measurements,  respectively). 
Unless  otherwise  specified  by  the  user,  all  measurement  types  are  equally  weighted  in 


AIMS;  i.e.  am  = - - - 

^ measurements 


Vm . 


The  current  version  of  AIMS  is  able  to  treat  both  stationary  and  mobile  concentration 
measurements.  Our  goal  is  to  expand  the  capability  to  treat  more  observation  types  in 
future  versions  of  AIMS. 


Additionally,  notice  in  Equation  (2)  that  a  minimum  value  for  the  denominator  is  set  to  be 
equal  to  5%  of  the  largest  denominator.  This  is  applied  as  a  scaling  mechanism  to 
account  for  the  potentially  vast  range  of  data  values  from  the  sensors  over  the  course  of 
the  measurement  period.  In  other  words,  applying  this  scaling  avoids  artificial  inflation  of 
the  cost  function  by  low-magnitude  measurements  at  time  i:  notice  that 

Nsensors  2 

as  lij  ( KT )  — *  0 ,  the  total  cost  is  dominated  by  the  model-data  discrepancies  at  time 

7=1 

i,  even  for  relatively  low  values  of  the  numerator.  On  the  other  hand,  omitting  the 
denominator  in  Equation  (2)  causes  the  cost  function  to  be  dominated  by  the  absolute 
model-data  discrepancies  in  the  larger-magnitude  measurements,  while  the  lower- 
magnitude  data  are  essentially  ignored.  In  that  case,  one  would  often  obtain  source 
estimates  with  large  model-data  discrepancies  at  the  lower-magnitude  data  points.  The 
scaling  mechanism  described  above  was  found  to  be  highly  effective  for  simultaneously 
maximizing  the  information  content  of  the  wide  range  of  data  values  typically  found  in 
practical  observational  datasets,  for  source  estimation. 
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AIMS  minimizes  the  cost  function  described  in  equation  (2)  using  the  Broyden-Fletcher- 
Goldfarb-Shanno  algorithm  [Press  et  al.,  (1992)],  which  uses  a  quasi-Newton  method. 
The  minimization  algorithm  requires  multiple  evaluations  of  the  cost  function,  as  well  as 
its  derivatives  with  respect  to  the  unknown  source  parameters  (3.  AIMS  accomplishes 
this  non-trivial  task  of  obtaining  the  gradients  of  a  complex  software  output  with  respect 
to  its  inputs  by  employing  an  Automatic  Differentiation  (AD)  tool  called  TAF 
(Transformation  of  Algorithms  in  FORTRAN  R.  [Giering  and  Kaminski,  (2006)]  ). 


Automatic  Differentiation 

Automatic  Differentiation  allows  evaluation  of  the  derivatives  of  a  function  specified  by 
a  computer  program.  Note  that  this  is  not  a  trivial  matter,  since  one  often  does  not  have  a 
closed  form  expression  relating  outputs  to  inputs  in  numerical  models;  and  determining 
these  relations  is  typically  intractable,  particularly  in  large,  complex  models  like 
SCIPUFF.  AD  tools  take  as  input  the  source  codes  that  constitute  the  forward  model  and 
automatically  generate  corresponding  source  codes  for  computing  the  gradients. 

We  apply  the  AD  tool  called  TAF  [Giering  and  Kaminski,  (2006)].  TAF  exploits  the 
chain  rule  for  computing  the  first  derivative  of  a  function  with  respect  to  a  set  of  input 
variables.  Treating  a  given  forward  code  as  a  composition  of  operations  -  each  line 
representing  a  compositional  element,  the  chain  rule  is  rigorously  applied  to  the  code, 
line  by  line.  The  resulting  tangent  linear  or  adjoint  code  returns  the  lacobian  matrix 
computed  in  forward  (following  model  evaluation  steps  from  input  to  output)  or  reverse 
(backtracking  from  output  to  input)  order,  respectively.  Both  approaches  yield  accurate 
analytical  gradients,  though  the  tangent-linear  is  more  straight-forward  to  implement;  it  is 
also  more  computationally  efficient  when  computing  gradients  of  several  outputs  to  only 
a  few  inputs.  The  adjoint  mode  is  more  efficient  for  computing  gradients  of  a  few  outputs 
to  several  inputs  [Zhang  et  al.,  (2008)].  In  practice,  the  adjoint  mode  involves  some 
additional  computational  overhead  costs  and  only  outperforms  the  tangent-linear  when 
the  number  of  outputs  is  much  greater  than  the  number  of  inputs. 

A  little  linear  algebra  helps  to  understand  the  issue.  Let  M  be  a  general,  nonlinear  model, 

i.e.  a  mapping  from  the  m-dimensional  space  UcJRm  of  input  variables  . um  * 

(initial  conditions  such  as  concentrations,  boundary  conditions  such  as  emissions,  or 

model  parameters)  to  the  n-dimensional  space  ^  ^ '  of  model  output  variable 

v=(vt . v«)  (model  concentrations  or  diagnostics  thereof,  objective  functions  such  as  a 

model  vs.  data  misfit,  ...)  under  consideration, 

M:U->V_ 

h->v  =  A/(m)  (3) 

The  vectors  fie  L  and  ve  \  may  be  represented  with  respect  to  some  given  basis  vectors 
spon(U)—  . m  and  spfln(V)—  . „  as 
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m  n 

S=lMr  V=I  Vjfj 

1=1  .  7=1 

Two  routes  may  be  followed  to  determine  the  sensitivity  of  the  output  variable  v  to  its 
input  l\ :  forward  (or  direct  sensitivity)  and  reverse  (or  adjoint  sensitivity). 


“Forward”  or  “Tangent-Linear”  Sensitivity 

Consider  a  perturbation  to  the  input  variables  Si  (typically  a  single  component 
Si  =  SuIel ).  Their  effect  on  the  output  may  be  obtained  via  the  linear  approximation  of 
the  model  M  in  terms  of  its  Jacobian  matrix  M  ,  evaluated  in  the  point  u(0)  according  to 


5v=  M  u<°)5 a 


(4) 


with  resulting  output  perturbation  S  .  In  components  Mji=3M/5ui,  it  reads 


6\)=X  M  u(0)5//; 
i 


(5) 


Eq.  (4)  is  the  tangent-linear  model.  In  contrast  to  the  full  nonlinear  model  M,  the  operator 
M  is  just  a  matrix  which  can  readily  be  used  to  find  the  forward  sensitivity  ofv  to 
perturbations  in  u,  but  if  there  are  very  many  input  variables,  it  quickly  becomes 
prohibitive  to  proceed  directly  as  in  (4),  if  the  impact  of  each  component  e-,  is  to  be 
assessed.  As  an  example,  let  vj  be  the  concentration  at  a  point  xj  at  the  end  of  the  model 
integration,  u  a  field  of  initial  emissions,  and  ask  what  is  the  sensitivity  of  v,  to  ur,  i.e. 
what  is  the  change  S’  f  under  changes  Si .  To  obtain  a  full  picture,  one  would  have  to 

perturb  each  component  of  the  initial  field  du.j,  i=\,...,m  separately  and  perform  a  forward 
calculation  for  each  perturbed  state. 


“Reverse”  or  “Adjoint”  Sensitivity 

Let  us  consider  the  special  case  of  a  scalar  objective  function  J  (v )  of  the  model  output. 
This  scalar  is  either  the  concentration  at  a  certain  model  grid  point  as  in  the  above 
example,  or  a  measure  of  some  model-to-data  misfit.  The  description  of  the  model  as  in 
Eq.  (3)  can  then  be  extended  to  read 

J.U-+V-+R 

u-*v  =  \l(u )  i— » j(u )  =  j(m(u  ))  (6) 

The  perturbation  of  J  around  a  fixed  point  Jo,  J=Jo+SJ  can  be  expressed  in  both  bases  of 
u  and  v  with  respect  to  their  corresponding  inner  product 
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J  =  J\-  +  (VbJT^.6u)  +  0(5u:) 
«  u 

=  j\-  +  (VyJT  ,0-Sv)  +  O(Sy) 


V  (7) 

After  some  straightforward  algebra  the  gradient  VUJ  can  be  readily  inferred  by  invoking 
the  adjoint  M* of  the  tangent  linear  model  M 


=  mt 


=mt 


■Vv/J 


5v 


=  Su  (8) 

Eq.  (6)  is  the  adjoint  model,  in  which  M  is  the  adjoint  (here,  the  transpose)  of  the  tangent 
linear  operator  M,  Sv  *  the  adjoint  variable  of  the  model  state  v  ,  and  &  *  the  adjoint 
variable  of  the  control  variable  u  . 

The  reverse  nature  of  the  adjoint  calculation  can  be  readily  seen  as  follows.  Consider  a 
model  integration  which  consists  of  A  consecutive  operations  Ma(Ma- 

i( . (Mi(Mo( u  ))))),  where  the  M’s  could  be  the  elementary  steps,  i.e.  single  lines  in  the 

code  of  the  model,  or  successive  time  steps  of  the  model  integration,  starting  at  step  0  and 
moving  up  to  step  A,  with  intermediate  M,{  u  )=v  t/A  11  and  final  MA(  u)=v  <A+r>-  p  .  Let  J 
be  a  cost  function  that  explicitly  depends  on  the  final  state  v  only  (this  restriction  is  for 
clarity  reasons  only).  J(u)  may  be  decomposed  according  to: 

m*))=j(MA(Mu( . mmm  <<» 

Then,  according  to  the  chain  rule,  the  forward  calculation  reads,  in  terms  of  the  Jacobian 
matrices  (we’ve  omitted  the  I’s  which,  nevertheless  are  important  to  the  aspect  of  tangent 
linearity;  note  also  that  by  definition  ( .5v)=V ,J'5v) 


VrJ'M  A’ 


MrMfSu 


=V,Jdv 


(10) 


whereas  in  reverse  mode  we  have 
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Afr(v,/)=  \ll  \t\ • . M[  VrJT 

=V./ 

clearly  expressing  the  reverse  nature  of  the  calculation. 


(ID 


Source  Estimation  Algorithm  Heuristics 

Figure  5  summarizes  the  source  estimation  algorithm  implemented  in  AIMS.  Several 

heuristics  were  developed  to  improve  the  algorithm  robustness,  including: 

1 .  Automatic  starting  guess  heuristic  -  automatically  determines  an  appropriate  starting 
guess  of  the  unknown  source  parameters,  so  that  users  are  not  required  to  have  any 
prior  knowledge/guess  of  source  parameters.  This  algorithm  works  as  follows:  it 
starts  by  identifying  the  maximum  concentration  measured;  its  corresponding 
measurement  time  and  location  are  used  as  the  initial  guess,  together  with  a  low  mass 
value  (10' 10  Kg). 

2.  Cost  function  minimization  heuristics  -  to  separate  the  cost  minimization  with  respect 
to  release  time  from  its  minimization  with  respect  to  the  other  unknown  source 
parameters  (source  location,  mass  and  duration);  we  found  this  approach  to  improve 
the  source  estimation  success  rate.  This  algorithm  works  as  follows:  after  evaluating 
the  cost  for  the  initial  guess,  two  other  earlier  release  times  (determined 
automatically)  are  also  evaluated.  Thereafter,  the  time  search  will  be  expanded  on  the 
direction  of  the  lowest  cost  until  a  minimum  is  bounded.  Once  a  minimum  is 
bounded,  the  time  intervals  are  bisected  until  the  desired  time  interval  precision  is 
achieved.  The  best  estimated  source  is  the  one  that  yields  the  lowest  cost  function 
value.  When  dealing  with  multiple  sources,  the  above  procedure  is  repeated  up  to 
three  additional  times  (total  of  four  sources),  each  time  adding  a  starting  guess  for  a 
new  source  to  the  solution  from  the  previous  round.  To  determine  the  number  of 
sources,  the  optimal  cost  for  the  different  guesses  are  compared  and  the  final  solution 
is  the  total  number  of  individual  sources  that  yielded  the  lowest-magnitude  cost. 

Additionally,  the  cost  function  is  artificially  enlarged  by  addition  of  a  constant 
whenever  a  non-physical  solution  is  encountered  in  the  minimization  process,  to  force 
the  algorithm  back  within  the  range  of  physically  viable  source  parameter  values. 
This  enabled  us  to  essentially  create  a  constrained  minimization  algorithm  while 
retaining  the  (unconstrained)  quasi-Newton  solver  which  we  have  found  to  be  very 
effective  for  AIMS.  Parameter  values  classified  in  AIMS  as  non-physical  are:  (a) 
negative  source  mass;  (b)  source  estimates  at  locations  undetectable  by  the  sensors  (in 
the  case  of  multiple  sources);  and  (c)  negative  source  duration  (for  continuous 
sources). 
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Figure  5:  Summary  of  source  estimation  algorithm  implemented  in  AIMS 


Quantifying  Source  Estimate  Uncertainty 

Uncertainties  in  the  final  source  estimate  can  arise  from  two  main  sources:  1)  imperfect 
plume  inversion;  and  2)  non-ideal  data  and  dispersion  (forward  and  inverse)  modeling. 
These  uncertainties  are  quantified  using  the  following  metrics: 

1.  Solution  Quality  Index 

Given  ideal  data  and  perfect  dispersion  model,  the  accuracy  of  the  source  estimate  is 
limited  only  by  the  success  of  the  plume  inversion;  i.e.  the  magnitude  of  Cost(j3*)  in 
Equation  (2).  For  the  exact  solution,  Cost(j3*)  =  0  ;  therefore,  the  magnitude  of  this  term 
quantifies  the  error  in  the  final  solution.  Although  it  is  not  trivial  to  exactly  quantify  the 
error  in  j3*  from  the  error  in  Cost ,  it  is  clear  that  they  are  directly  proportional.  So,  a 
larger  cost  value  indicates  a  larger  inversion  error. 

In  practical  source  estimation  scenarios,  it  may  be  difficult  to  determine  the  accuracy  of 
the  source  estimate,  since  the  actual  source  parameters  are  not  known.  However,  it  is 
clear  that  solutions  with  larger  cost  values  are  generally  less  accurate  than  those  with 
smaller  costs.  Therefore,  we  define  a  simple  solution  quality  index ,  as  described  below, 
for  quantifying  the  accuracy  of  source  estimations  using  AIMS. 


It  is  interesting  to  note  the  behavior  of  the  cost  function  when  the  sources  are  moved 
away  from  the  observation  regions.  For  instance,  consider  a  modeled  release  occurring  at 
a  location  or  time  such  that  it  is  not  detected  by  the  sensors.  In  this  case 
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Y™°A  (j3)  =  0  Vi,  j  ;  and  Cost  =  const  =  Cost „  (see  equation  (2)).  It  is  clear  that  this  is  an 

inaccurate  source  estimate  and,  as  described  earlier,  AIMS  considers  this  an  inadmissible 
solution.  Thus,  it  is  convenient  to  define  the  solution  quality  index  as: 

9=l_cosm_'  (12) 

Cost ^ 

such  that  </>  =  1  for  a  perfect  source  estimation  involving  ideal  data  and  model,  (j)  =  0  for 
inadmissible  solutions  as  described  above,  and  in  general  0  <  (j)  <  1 .  In  addition  to 
imperfect  source  estimation,  other  factors  that  contribute  to  non-unit  (j)  are  data  errors 
and  model  errors  (i.e.  forward  dispersion  model’s  inability  to  accurately  represent 
reality).  A  useful  interpretation  for  (j)  is  that  it  is  proportional  to  the  fraction  of  the  total 
information  in  the  measurement  data  that  is  accurately  captured  by  the  forward  model 
Tmod  evaluated  at  the  source  estimate  [3 . 

2.  Data  Information  Content  Index 

As  described  earlier,  the  uniqueness  of  an  inverse  problem  solution  relies  on  “sufficient” 
information  in  the  observations.  Without  sufficient  data  information  content,  it  is  possible 
to  have  several  solutions  with  perfect  feasibility,  (j)  =  1 .  However,  one  cannot  rely  on 
such  estimates,  as  they  are  highly  uncertain. 

Again,  it  is  useful  to  consider  the  behavior  of  the  cost  function  in  relation  to  data 
information  content.  For  simplicity,  we  consider  a  single  point  concentration  sensor 
recording  useful  observations  over  a  given  time  period.  As  used  here,  useful  observations 
are  those  that  capture  the  temporal  evolution  of  the  passing  plume  at  a  given  location;  in 
other  words,  the  rise  in  concentration,  followed  by  the  plateau  and  the  decay.  Addition  of 
other  such  sensors  at  different  locations  provides  increasing  information  content  on  the 
spatial  evolution  of  the  plume  and  decreases  the  space  of  highly  feasible  solutions  for 
Equation  (2),  since  more  constraints  must  now  be  satisfied.  So,  the  cost  function  well 
becomes  narrower  with  increasing  data  information  content.  This  is  illustrated  with 
synthetic  data  on  the  following  figure.  A  similar  concept  has  been  observed  by  Sharan  et 
al.  [2009], 
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Release  Time  (hr) 


These  findings  motivated  the  definition  of  a  data  information  content  index: 

W)-x  ,t, 

*  I  _  9 


(13) 


where  A  is  a  user-set  change  in  feasibility  (we  use  0.1),  Lc  is  a  characteristic  length  scale 
(we  use  the  distance  between  a  source  and  the  nearest  sensor),  u  is  the  average 
horizontal  wind  velocity  and  trel  is  the  estimated  release  start  time.  One  interpretation  for 
this  metric  is  that  it  quantifies  the  change  in  (j)  over  a  characteristic  time  interval,  defined 
here  as  the  approximate  travel  time  from  source  to  nearest  sensor.  A  larger  value  denotes 
better  information  content. 


Modifications  to  SCIPUFF:  Modeling  Continuous  Releases 

SCIPUFF  forward  model  is  capable  of  simulating  a  continuous  release  from  a  source, 
however  there  are  several  difficulties  for  operating  TAF  on  this  section  of  the  code.  The 
difficulties  are  rooted  in  the  data  management  and  embedded  computational  syntax  used 
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and  create  discontinuities  for  the  application  of  the  chain  rule  by  the  automatic 
differentiation  code. 

In  order  to  successfully  include  continuous  releases  in  the  inverse  model  one  possible 
approach  would  have  been  to  identify  and  construct  workarounds  for  all 
incompatibilities,  this  could  have  potentially  been  very  time  intensive.  Another  possible 
solution  is  to  start  from  TAF-compatible  code  for  instantaneous  releases  and  reproduce 
SCIPUFF  continuous  release  algorithm  avoiding  known  TAF-unfriendly  constructs.  The 
second  approach  was  chosen  and  it  is  based  in  using  a  series  of  consecutive  instantaneous 
puff  releases  that  spread  and  overlap  downstream  approximating  a  continuous  source,  as 
shown  in  Figure  6.  The  algorithm  implementation  details,  which  highlight  the 
differences  between  SCIPUFF  original  code  and  the  TAF-friendly  implementation,  are 
presented  in  Figure  7. 


Figure  6:  Continuous  source  approximated  using  a  series  of  consecutive  instantaneous  puff  releases 
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Figure  7:  Algorithm  implementation  for  modeling  continuous  releases 


18 


1/17/2011 


Aerodyne  Research,  Inc. 


W9 1 1 NF-06-C-0 161 


The  TAF-friendly  implementation  replaces  the  complex  metrics  used  in  the  original  code 
for  determining  time  interval  between  puff  releases  with  a  heuristic  rule.  A  comparison 
between  the  predictions  of  both  codes  shows  that  the  new  AIMS  implementation 
reproduces  the  predictions  of  the  original  code. 

The  methods  described  above  have  been  implemented  in  AIMS  and  have  been 
successfully  applied  for  several  source  estimation  scenarios,  with  illustrative  results 
presented  in  the  next  section. 

Source  Location  Model  Testing  and  Validation 

AIMS  has  been  successfully  applied  to  several  problems  for  identification  and 
quantification  of  emission  sources.  A  series  of  applications  to  model-generated  data 
enabled  performance  testing  which  was  used  for  algorithm  development  and  validation. 
Subsequently,  AIMS  has  also  been  successfully  used  in  a  number  of  field  data 
applications. 

Application  to  Ideal  (model-generated)  Data 

In  this  section,  we  report  the  performance  of  AIMS  for  several  test  cases  involving 
single-source  and  multiple-source  events,  as  well  as  stationary  and  mobile  concentration 
measurement  data.  In  each  case,  we  consider  both  instantaneous  and  continuous  release 
types.  Finally,  we  investigate  the  effects  of  data  quantity  and  quality  by  varying  the 
number  and  spatial  resolution  of  sensors  and  by  artificially  adding  random  noise  to 
model- generated  sensor  data. 

The  release-detection  configurations  used  in  the  following  test  cases  were  based  on  the 
initial  plans  for  the  FUSION  Field  Trials  performed  in  September  of  2007  (FFT-07) 
[Platt  et  al.  (2008)].  The  basic  setup  consists  of  100  concentration  sensors  evenly 
distributed  over  a  1-km  square  grid  (see  Figure  8). 
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Figure  8:  FFT-07-inspired  configuration  showing  stationary  sensor  network,  wind  direction  and 
release  location. 

We  simulated  observational  sensor  data  by  recording  SCIPUFF-computed  concentrations 
at  specified  measurement  times  and  positions,  assuming  one  or  more  releases  of  SF6 
(chemically  inert)  gas.  We  also  assumed  uniform  and  constant  meteorological  conditions 
over  the  release-detection  domain.  In  each  case,  we  applied  AIMS  to  estimate  the 
(assumed  unknown)  corresponding  source  parameters  (location(s),  time(s), 
mass(es)/rate(s),  duration(s),  number  of  sources).  Note  that  by  using  model- generated 
data,  we  are  able  to  control  secondary  effects  like  model  and  data  uncertainties  in 
evaluating  the  performance  of  our  algorithm. 

1.  Single-source  scenarios 

As  a  simple  starting  point,  we  tested  the  method  for  cases  where  one  knows  that  only 
one  source  is  involved. 

Instantaneous  releases 

The  release  parameters  to  be  estimated  are  position,  time  and  mass.  We  applied  a 
constant  and  uniform  two-dimensional  wind  field  with  a  velocity  of  3.2m/s  and  direction 
346  degrees.  At  these  conditions,  we  simulated  a  lOOg  release,  3m  above  ground  at 
position  (-116.38°,  37.149°)  and  we  observed  the  concentration  profiles  shown  in  Figure 
9.  We  then  recorded  observational  data  for  each  sensor  at  eight  3-minute  time  intervals, 
starting  at  6  minutes  after  the  release  occurred. 
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t  =  6  min 
Ymax  =  2120  ppt 


t  =  9  min 
Ymax  =  823  ppt 


t  =  12  min 
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Ymax  =  0.95  ppt 


Figure  9:  SCIPUFF-generated  concentration  profiles  illustrating  data  sampling  process.  The  grid 
markers  indicate  stationary  sensor  positions,  where  sample  data  were  recorded.  Only  the  first  six 
sampling  times  are  shown  because  they  capture  the  puff  entry  into  and  exit  from  the  sensor  network. 


37.155 

37.15 

37.145 

37.14 

37.135 

37.13 

37.125 

-116.155  -116.145  -116.135  -116.125  -116.115 

Longitude  (E) 


♦  Sensor  locations 
O  Release  point  -  Actual 
gg  ▲  Release  point  -  First  guess 

X  Release  point  -  Final  estimate 


♦♦♦♦♦♦♦♦♦♦ 


Release  time 
(min.) 


Figure  10:  Source  estimation  results  for  a  single  instantaneous  release. 


Figure  10  shows  that  our  source  estimation  algorithm  very  accurately  estimates  the 
unknown  source  in  this  problem,  even  with  very  poor  starting  guesses.  In  fact,  the  source 
is  identified  exactly  in  this  case.  Note  that  we  report  release  altitudes  relative  to  ground 
level  and  release  times  relative  to  the  actual  release  time. 
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Continuous  releases 

Increasing  the  complexity  of  the  test  problem,  we  applied  AIMS  to  a  continuous-source 
case.  In  addition  to  the  release  time,  position  and  strength  estimated  for  the  unknown 
instantaneous  release,  here  we  must  also  estimate  the  unknown  release  duration. 

We  applied  the  same  meteorological  conditions  and  generated  observational  data  at  the 
same  sampling  times  described  for  the  previous  test  case.  Again,  assuming  an  unknown 
source,  we  applied  AIMS  to  estimate  the  source  parameters,  given  the  observational  data. 
The  results  in  Figure  11  demonstrate  that  we  are  able  to  accurately  estimate  unknown 
sources  involving  a  single  continuous  release. 


2.  Multiple-source  scenarios 

In  practice,  source  estimation  problems  can  involve  multiple  sources  contributing 
simultaneously  to  the  observational  dataset.  In  this  subsection,  we  investigate  the 
performance  of  our  algorithm  for  such  scenarios.  As  usual,  we  will  first  consider  the 
simpler  case  of  instantaneous  releases  before  we  treat  continuous  releases.  When  dealing 
with  multiple  sources,  one  must  estimate  an  additional  parameter  -  the  number  of 
sources.  For  simplicity,  we  first  assumed  that  all  releases  in  these  tests  are  known  to  have 
occurred  at  the  same  time.  We  found  that,  unlike  in  the  single-source  cases,  here  we 
needed  to  assume  known  release  altitudes  in  order  to  obtain  satisfactory  source  estimates. 
More  recent  advances  in  the  algorithm  development  have  eliminated  the  need  for  this 
restriction,  as  will  be  seen  later  in  the  sections  on  field  data  application. 

For  the  test  problems  discussed  in  the  current  section,  note  that  although  we  assumed  all 
other  source  parameters  to  be  unknown,  for  clarity  we  have  reported  source  location 
estimates  only.  Source  location  accuracies  were  representative  of  the  performance  for  the 
other  source  parameters. 

Instantaneous  releases 

We  considered  release  scenarios  involving  2  or  4  sources.  Figures  12  to  15  show  the 
source  estimates  assuming  1,  2,  3,  or  4  releases  in  each  case,  and  the  cost  function  value 
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for  each  estimate.  Figure  12  and  Figure  14  represent  cases  where  the  observational  data 
clearly  indicated  multiple  separate  events,  while  in  Figure  13  and  Figure  15  all  puffs 
overlapped  significantly  over  the  sensor  network  and  it  was  not  clear  from  the 
concentration  profiles  that  there  were  multiple  releases.  Measurement  times  and 
meteorological  conditions  were  identical  to  those  applied  for  previous  cases,  except  in 
Figure  13  where  we  applied  a  0-degree  wind  direction. 


Longitude  (E)  Estimated  number  of  releases 

Figure  12:  Source  estimation  results  for  two  simultaneous  instantaneous  releases  -  minimal  puff 
overlap. 
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Figure  13:  Source  estimation  results  for  two  simultaneous  instantaneous  releases  -  significantly 
overlapping  puffs. 
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Figure  14:  Source  estimation  results  for  four  simultaneous  instantaneous  releases  -  minimal  puff 
overlap. 
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Figure  15:  Source  estimation  results  for  four  simultaneous  instantaneous  releases  -  significantly 
overlapping  puffs. 


In  general,  our  algorithm  correctly  identified  the  numbers  and  locations  of  all  sources. 
Not  surprisingly,  cases  involving  puff  overlaps  or  more  release  points  were  more 
challenging.  Note  in  Figure  13  that  the  cost  function  was  lowest  when  we  assumed  four 
sources,  even  though  only  two  sources  were  actually  present.  Closer  examination  reveals 
that  the  method  estimated  two  identical  releases  at  each  of  the  two  actual  release  points 
and  accurately  estimated  the  total  masses  -  effectively  accurately  describing  the  two 
actual  sources.  Although  the  result  is  ultimately  accurate,  we  are  surprised  by  this 
behavior  and  we  hope  to  gain  a  better  understanding  through  some  ongoing  studies. 

Continuous  releases 

We  repeated  the  numerical  experiment  described  in  Figure  13,  this  time  applying  two 
continuous  sources  of  equal  duration.  The  result  is  shown  in  Figure  16.  Here,  we  found 
that  we  also  needed  to  assume  known  release  durations  (in  addition  to  release  altitudes)  in 
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order  to  obtain  satisfactory  source  estimates.  Subsequent  advances  in  the  algorithm 
development  eliminated  this  limitation,  as  will  be  seen  later  when  field  data  application  is 
discussed. 


37.160 


37.155 

37.150 

Z 

0 

T3  37.145 

3 

CO 

-I 

37.140 


37.135 


37.130 

-116.15  -116.14  -116.13  -116.12  -116.11  -116.1 

Longitude  (E) 

Figure  16:  Source  estimation  results  for  two  simultaneous  continuous  releases  -  significant  plume 
overlap. 

Another  interesting  observation  here  is  that  although  there  were  actually  only  two 
releases,  the  cost  function  was  lowest  for  the  three-release  configuration.  However,  one 
of  the  releases  was  positioned  such  that  it  essentially  never  impacted  the  sensors  - 
effectively  (and  accurately)  describing  two  sources. 

3.  Effects  of  data  quantity 

All  of  the  cases  considered  so  far  have  applied  observational  data  from  100  sensors  in  a 
1  -km  square  grid.  In  practice,  one  would  most  often  have  much  fewer  sensors,  with  lower 
spatial  resolution.  To  investigate  the  impact  of  such  a  decline  in  data  quantity  on  the 
performance  of  AIMS,  we  repeated  the  single-source  numerical  experiments  described 
earlier,  gradually  decreasing  the  number  of  sensors. 

Figure  17  and  Figure  18  show  that  AIMS  is  robust  to  low  data  quantities.  Not 
surprisingly,  continuous  sources  appear  to  require  more  sensor  data  for  accurate  source 
estimation  than  do  instantaneous  sources. 
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Figure  17:  Source  estimation  results  for  single  instantaneous  source  -  gradually  decreasing  data 
quantity. 
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Figure  18:  Source  estimation  results  for  single  continuous  source  -  gradually  decreasing  data 
quantity. 


4.  Effects  of  data  quality  (noisy  sensor  data) 

Finally,  we  investigated  the  impact  of  data  quality  on  the  performance  of  AIMS. 
Since  all  sensor  data  were  model-generated,  we  artificially  added  random  noise  to 
simulate  measurement  uncertainties.  We  applied  random  errors  from  a  normal 
distribution  in  the  range  ±(3%  of  actual  concentration  value  +  15)  ppt,  which  is 
representative  of  measurement  uncertainties  in  practice.  AIMS  accurately  estimated  the 
sources  using  100,  25,  and  9  sensors,  and  the  impact  of  the  noise  was  first  observed  with 
4  sensors  (see  Figures  19  and  20). 


26 


1/17/2011 


Aerodyne  Research,  Inc.  W91 1NF-06-C-0161 


37.155 

37.150 

37.145 

37.140 

37.135 

37.130 

37.125 

-116.15  -116.14  -116.13  -116.12 


X 

♦  Sensor  locations 
o  Release  point  -  Actual 
▲  Release  point  -  First  guess 

X  Final  estimate  -  Ideal  data 

X  Final  estimate  -  Noisydata 

♦  X 

♦ 

A 

♦ 

♦ 

Longitude  (E) 


Figure  19:  Source  estimation  for  single  instantaneous  source  -  ideal  versus  noisy  sensor  data. 


These  results  indicate  that  our  source  estimation  method  is  robust  to  noisy  data,  but  is 
more  sensitive  to  data  quantity  with  noise  than  without;  perhaps  more  data  are  needed  to 
“regularize”  the  effects  of  noise. 
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Figure  20:  Source  estimation  for  single  continuous  source  -  ideal  versus  noisy  sensor  data. 


5.  Multi-Measurement  Data:  Beyond  Stationary  Concentration  Sensors 


All  of  the  previous  examples  involved  concentration  measurements  using  stationary 
sensors.  However,  one  of  the  goals  of  AIMS  is  to  enable  integration  of  multiple 
measurement  types  in  order  to  maximize  information  content  for  source  estimation.  In 
addition  to  stationary  concentration  measurements,  the  current  version  of  AIMS  is  able  to 
process  mobile  concentration  measurements  for  source  estimation.  This  is  demonstrated 
in  Figure  21. 


In  this  example,  mobile  sensor  data  are  obtained  by  simulating  a  mobile  instrument 
recording  ground-level  propene  concentration  data  while  driving  around  a  4-km  region 
surrounding  several  industrial  facilities.  The  goal  is  to  identify  the  source  of  propene  in 
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this  scenario,  assuming  that  the  actual  source  is  unknown.  As  shown  in  Figure  21,  AIMS 
successfully  identifies  the  source,  using  data  from  the  mobile  concentration 
measurements. 
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Figure  21:  AIMS  successfully  identified  the  unknown  source  in  this  example  involving  mobile 
concentration  measurements. 


AIMS  Application  to  Field  Data 

In  addition  to  model-generated  data  shown  in  the  previous  examples,  AIMS  has  also  been 
successfully  applied  to  field  data,  for  controlled-release  (experimental)  scenarios  as  well 
as  for  real  unknown  source  scenarios.  The  results  of  these  AIMS  applications  are 
discussed  below. 

FFT-07  Controlled  Releases 

A  field  trial  campaign,  called  FUsing  Sensor  Information  from  Observing  Networks 
(FUSION)  Field  Trial  2007  (or  “FFT-07”)  was  performed  in  2007,  funded  by  the 
Defense  Threat  Reduction  Agency  (DTRA).  FFT-07  was  a  short-range  (approximately  1- 
2  km)  controlled  dispersion  test  designed  to  collect  data  to  support  development  of 
prototype  algorithms  for  source  estimation.  A  range  of  release  scenarios  were  considered: 
daytime  and  nighttime  emissions,  single  and  multiple  sources,  instantaneous  and 
continuous  sources.  The  details  of  FFT07  are  provided  in  Platt  and  Deriggi  [2010].  In 
summary,  the  data  provided  to  algorithm  developers  involved  104  cases  of  propene  gas 
releases,  with  4  or  16  stationary  sensors  used  for  data  sampling  in  each  case.  The 
unknowns  were:  number  of  sources,  source  locations,  strengths,  times  and  durations. 
AIMS  was  demonstrated  as  a  robust  and  reliable  model,  consistently  in  the  top  rankings 
for  all  scenarios  in  an  independent  inter-comparison  study  by  the  Institute  for  Defense 
Analyses  involving  eight  inverse  plume  dispersion  algorithms  (Platt  et  al.  2010).  On 
average,  AIMS  source  estimates  were  within  100m  of  the  actual  sources.  These  results 
are  summarized  on  the  figures  below. 
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Figure  22:  Distance  between  AIMS-estimated  source  location  and  actual  FFT07  source  location, 
averaged  over  all  sources  in  each  case.  Results  are  grouped  by  number  of  sources  and  source  types. 
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Figure  23:  Average  distance  between  estimated  source  location  and  actual  FFT07  source  location  for 
different  participating  algorithms.  The  four  plots  correspond  to  different  types  of  releases:  Single 
day  puff,  single  day  continuous  source,  double  night  puffs,  double  night  continuous  sources.  Smaller 
bars  indicate  better  predictions  as  the  estimated  location  is  closer  to  the  real  location. 
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Figure  24:  Average  distance  between  estimated  source  location  and  actual  FFT07  source  location  for 
different  participating  algorithms.  The  four  plots  correspond  to  different  types  of  releases:  Single 
night  puff,  single  night  continuous  source,  double  day  puffs,  double  day  continuous  sources.  Smaller 
bars  indicate  better  predictions  as  the  estimated  location  is  closer  to  the  real  location. 


In  this  application,  the  best  results  were  obtained  for  stable  wind  conditions  and  the  most 
challenging  runs  involved  highly  varying  winds  or  narrow  sensor  signals,  as  illustrated  in 
the  following  figures. 
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Figure  25:  Example  FFT07  case  with  stable  wind  velocity  and  direction  (Case  26).  Good  agreement  is 
shown  between  model  and  data  concentration  profiles.  The  estimated  solution  approximates  well  the 
real  release  parameters. 
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Figure  26:  Example  FFT07  case  with  highly  variable  wind  velocity  and  direction  (Case  47). 
Agreement  between  model  and  data  concentration  profiles  is  not  optimal.  The  estimated  solution 
might  not  approximate  the  real  release  parameters  as  well  as  in  the  case  of  stable  winds. 
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Figure  27:  Example  FFT07  challenging  case  with  narrow  concentration  signals  (Case  41).  Narrow 
signals  indicate  different  release  times  for  each  puff.  A  good  initial  guess  for  location  and  release  time 
of  each  event  are  very  important  because  the  cost  penalty  for  not  predicting  a  peak  with  the 
calculated  solution  is  not  too  high  (only  a  few  concentration  points  in  the  peak). 


The  Study  of  Houston  Atmospheric  Radical  Precursors  (SHARP) 


SHARP  was  a  large  multi-organizational  air  quality  analysis  campaign  performed  in 
2009,  with  one  of  its  goals  being  identification  and  quantification  of  key  air  pollutant 
sources.  Along  with  other  instruments,  the  Aerodyne  Mobile  Laboratory  (pictured  below) 
was  used  for  continuous  sampling,  to  monitor  the  air  quality  in  order  to  detect  unwanted 
industrial  process  emissions. 


Figure  28:  The  Aerodyne  Research  Inc.  Mobile  Laboratory. 


The  ARI  mobile  laboratory  is  outfitted  with  many  instruments,  several  of  which  are 
developed  and  built  at  ARI  using  proprietary  technology,  for  obtaining  high-resolution, 
real-time  measurements  of  gases,  particles  and  aerosols  [Kolb  et  al.  (2004)].  An  onboard 
anemometer  records  wind  speed  and  direction  data.  A  significant  advantage  over 
stationary  sensors  is  that  one  can  obtain  a  wealth  of  useful  and  relevant  data  with  a  single 
sensor  by  following  observed  (non-lethal)  plumes  with  the  mobile  lab  (with  a  tailored 
truck  path). 
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Note  that  this  source  estimation  problem  is  complicated  by  the  fact  that  there  is  a  high 
density  of  industrial  activity  in  the  Houston  area;  therefore  there  are  several  potential 
sources.  AIMS  was  successfully  used  to  determine  the  emission  rates,  locations  and  times 
of  measured  combustion-source  pollutants  in  the  Houston  area.  The  solution  quality 
index  defined  earlier  (equation  12)  was  used  to  evaluate  the  source  estimation  solutions 
in  these  cases,  since  the  real  sources  were  unknown.  Solution  qualities  for  all  AIMS 
estimates  were  no  less  than  75%.  Figure  29  shows  an  example  application  of  AIMS  with 
mobile  concentration  measurements  during  the  SHARP  campaign. 


UTM  Zone  15,  Easting  (km) 

Figure  29:  Aerodyne  mobile  laboratory  measurements  of  benzene  (blue  triangles)  and  the 
responsible  emission  source  magnitude  and  location  (yellow  star)  identified  using  AIMS.  The 
observed  wind  direction  is  indicated  by  the  red  arrow.  The  measurements  were  taken  on  May  7,  2009 
in  Texas  City  during  the  Houston  SHARP  campaign. 


Summary 

Aerodyne  Research  Inc.  (ARI)  has  developed  and  demonstrated  an  algorithm  for  source 
estimation,  called  AIMS  (“Aerodyne  Inverse  Modeling  System”).  In  general  terms,  AIMS 
applies  a  variational  approach  for  source  estimation:  a  cost  function  is  defined  that 
quantifies  the  mismatch  between  all  observations  and  the  corresponding  model 
predictions  resulting  from  a  given  set  of  trial  source  parameters;  then,  the  optimal  set  of 
source  parameters  is  identified  as  the  values  for  which  the  cost  function  is  minimized: 

Cost  {(d)  =  II Data  -  Model  (/?)|| 

(14) 

fd*  =  arg  min  Cost  {(d) 

P 

where  [d  is  the  set  of  unknown  source  parameters;  and  (d*  is  the  value  of  [d  that  yields 
forward  model  predictions  that  are  most  consistent  with  the  data. 
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Indeed,  in  the  theoretical  limit  of  ideal  data  and  models,  the  global  minimum  of  this  cost 
function  exists  at  the  set  of  parameters  that  is  most  likely  responsible  for  the 
observational  data.  The  two  main  challenges  of  variational  approaches  in  practice  involve 
successfully  locating  the  (global)  minimum  of  the  cost  function  and  dealing  with  non¬ 
ideal  data  and  models.  The  former  challenge  demands  careful  definition  of  the  cost 
function  and  the  use  of  a  robust  minimization  algorithm.  The  latter  requires  awareness  of 
(and  accounting  for)  artificial  offsets  in  the  location  of  the  minimum,  due  to  non-ideal 
data  and  models. 

AIMS  takes  as  input  all  available  observational  data  and  optionally  any  prior  knowledge 
of  the  source  parameters.  The  output  is  the  set  of  source  parameters  that  best  describes 
the  observations,  including  number  of  sources,  emission  rates,  locations,  start  and  end 
times.  AIMS  is  also  designed  to  include  an  a  posteriori  assessment  of  its  solution  quality, 
providing  useful  feedback  on  how  much  confidence  to  put  in  a  particular  solution  and  in 
what  ways  the  solution  quality  might  be  improved.  A  novel  feature  in  AIMS  is  the 
ability  to  integrate  multiple  observation  types  in  order  to  maximize  information  content 
for  source  estimation.  This  capability  has  been  demonstrated  for  datasets  from  stationary 
and  mobile  sensors. 

AIMS  provides  a  significantly  improved  capability  to  protect  the  warfighter  and  civilians 
through  successful  identification  and  quantification  of  unknown  hazardous  atmospheric 
releases. 


Recommendations  for  Future  Research  &  Development 

1.  Sensor  technologies  have  evolved  and  continue  to  evolve  beyond  point  concentration 
data.  Expanding  the  capability  of  source  location  algorithms  to  accommodate  new 
types  of  observations  has  the  potential  to  not  only  increase  their  applicability  but  also 
their  accuracy  since  more  available  information  can  be  integrated  for  source 
estimation.  This  capability  can  be  implemented  in  AIMS  in  the  future. 

2.  Recent  advances  in  computational  science  and  technology  have  created  new 
opportunities  for  code  parallelization  that  can  be  exploited  to  reduce  source  location 
algorithm  run  times  and  enable  more  rapid,  real-time  threat  assessment.  In  particular, 
graphics  processing  units  (GPUs)  formerly  reserved  for  computer  graphics  rendering 
have  evolved  into  highly  cost-efficient  tools  for  highly  parallel  scientific 
computation.  Several  components  of  the  source  location  algorithm  developed  here 
involve  large  amounts  of  computations  that  can  each  be  performed  independently  and 
would  be  well-suited  for  parallelization.  This  feature  can  be  implemented  in  AIMS  in 
the  future  to  increase  its  speed  of  source  location. 

3.  Many  real-life  release  scenarios  involve  time-dependent  sources,  either  in  mass  flow 
rate,  position  or  both.  Expanding  the  capability  of  source  estimation  algorithms  like 
AIMS  to  characterize  time-varying  sources  would  be  very  valuable. 
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4.  AIMS  can  be  extended  to  automatically  provide  guidance  on  optimal  sensor 
placement.  This  would  be  especially  relevant  for  those  cases  when  the  data 
information  content  index  indicates  that  the  source  estimates  are  highly  uncertain. 
This  new  capability  would  be  used  to  plan  sensor  locations  in  preparation  for 
potential  events. 
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